In [1]:
import graphlab

In [2]:
song_data = graphlab.SFrame('song_data.gl/')


This non-commercial license of GraphLab Create for academic use is assigned to y_xwang@163.com and will expire on March 13, 2018.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1489735497.log

In [3]:
song_data.head()


Out[3]:
user_id song_id listen_count title artist
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOAKIMP12A8C130995 1 The Cove Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBBMDR12A8C13253B 2 Entre Dos Aguas Paco De Lucia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBXHDL12A81C204C0 1 Stronger Kanye West
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBYHAJ12A6701BF1D 1 Constellations Jack Johnson
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODACBL12A8C13C273 1 Learn To Fly Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODDNQT12A6D4F5F7E 5 Apuesta Por El Rock 'N'
Roll ...
Héroes del Silencio
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODXRTY12AB0180F3B 1 Paper Gangsta Lady GaGa
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFGUAY12AB017B0A8 1 Stacked Actors Foo Fighters
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFRQTD12A81C233C0 1 Sehr kosmisch Harmonia
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOHQWYZ12A6D4FA701 1 Heaven's gonna burn your
eyes ...
Thievery Corporation
feat. Emiliana Torrini ...
song
The Cove - Jack Johnson
Entre Dos Aguas - Paco De
Lucia ...
Stronger - Kanye West
Constellations - Jack
Johnson ...
Learn To Fly - Foo
Fighters ...
Apuesta Por El Rock 'N'
Roll - Héroes del ...
Paper Gangsta - Lady GaGa
Stacked Actors - Foo
Fighters ...
Sehr kosmisch - Harmonia
Heaven's gonna burn your
eyes - Thievery ...
[10 rows x 6 columns]


In [4]:
graphlab.canvas.set_target('ipynb')

In [5]:
song_data['song'].show()


用户数


In [6]:
users = song_data['user_id'].unique()

In [8]:
len(users)


Out[8]:
66346

建立模型


In [9]:
train_data, test_data = song_data.random_split(.8, seed=0)

simple popularity-based recommander


In [12]:
popularity_model = graphlab.popularity_recommender.create(train_data, user_id='user_id', item_id='song')


Recsys training: model = popularity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 2.56009s
893580 observations to process; with 9952 unique items.

In [13]:
popularity_model.recommend(users=[users[0]])


Out[13]:
user_id song score rank
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Sehr kosmisch - Harmonia 4754.0 1
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Undo - Björk 4227.0 2
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
You're The One - Dwight
Yoakam ...
3781.0 3
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Dog Days Are Over (Radio
Edit) - Florence + The ...
3633.0 4
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Revelry - Kings Of Leon 3527.0 5
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Horn Concerto No. 4 in E
flat K495: II. Romance ...
3161.0 6
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Secrets - OneRepublic 3148.0 7
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Fireflies - Charttraxx
Karaoke ...
2532.0 8
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Tive Sim - Cartola 2521.0 9
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Drop The World - Lil
Wayne / Eminem ...
2053.0 10
[10 rows x 4 columns]

build a song recommander with personality


In [14]:
personalized_model = graphlab.item_similarity_recommender.create(train_data, user_id='user_id', item_id='song')


Recsys training: model = item_similarity
Warning: Ignoring columns song_id, listen_count, title, artist;
    To use one of these as a target column, set target = 
    and use a method that allows the use of a target.
Preparing data set.
    Data has 893580 observations with 66085 users and 9952 items.
    Data prepared in: 2.77412s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 7.116ms                        | 1.5        |
| 151.047ms                      | 100        |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 913.525ms                           | 0                | 0               |
| 1.25s                               | 41.75            | 4173            |
| 1.59s                               | 68.25            | 6800            |
| 1.93s                               | 99.5             | 9913            |
| 4.87s                               | 100              | 9952            |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 6.09529s

In [15]:
personalized_model.recommend(users=[users[0]])


Out[15]:
user_id song score rank
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Cuando Pase El Temblor -
Soda Stereo ...
0.0194504536115 1
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Fireflies - Charttraxx
Karaoke ...
0.0144737317012 2
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Love Is A Losing Game -
Amy Winehouse ...
0.0142865960415 3
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Marry Me - Train 0.014133471709 4
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Secrets - OneRepublic 0.013591665488 5
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Sehr kosmisch - Harmonia 0.0133987894425 6
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Te Hacen Falta Vitaminas
- Soda Stereo ...
0.0129302831796 7
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
OMG - Usher featuring
will.i.am ...
0.0127778282532 8
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
Y solo se me ocurre
amarte (Unplugged) - ...
0.0123411279458 9
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
No Dejes Que... -
Caifanes ...
0.0121042499175 10
[10 rows x 4 columns]


In [16]:
personalized_model.recommend(users=[users[1]])


Out[16]:
user_id song score rank
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Where The Boat Leaves
From (Album) - Zac Brown ...
0.0615360885859 1
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Different Kind Of Fine
(Album) - Zac Brown Band ...
0.0605283752084 2
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Jolene (Album) - Zac
Brown Band ...
0.0578682050109 3
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Sic 'Em On A Chicken
(Album) - Zac Brown Band ...
0.0551866963506 4
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Who's Kissing You Tonight
- Jason Aldean ...
0.0530633330345 5
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
What Country Is - Luke
Bryan ...
0.0374908074737 6
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Highway 20 Ride (Album) -
Zac Brown Band ...
0.0373315736651 7
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Do I - Luke Bryan 0.0330773591995 8
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
One Fine Wire - Colbie
Caillat ...
0.03125 9
02f015d32ac2cd1e52d26e3ec
36048711dd5711b ...
Midnight Bottle - Colbie
Caillat ...
0.0307377055287 10
[10 rows x 4 columns]


In [20]:
personalized_model.get_similar_items(['Do I - Luke Bryan'])


Out[20]:
song similar score rank
Do I - Luke Bryan Someone Else Calling You
Baby - Luke Bryan ...
0.147887349129 1
Do I - Luke Bryan Rain Is A Good Thing -
Luke Bryan ...
0.12222224474 2
Do I - Luke Bryan All My Friends Say - Luke
Bryan ...
0.116731524467 3
Do I - Luke Bryan What Country Is - Luke
Bryan ...
0.0869565010071 4
Do I - Luke Bryan A Little More Country
Than That - Easton Co ...
0.06713783741 5
Do I - Luke Bryan People Are Crazy - Billy
Curringham ...
0.0597402453423 6
Do I - Luke Bryan The Man I Want To Be -
Chris Young ...
0.0557940006256 7
Do I - Luke Bryan Why Don't We Just Dance -
Josh Turner ...
0.0513513684273 8
Do I - Luke Bryan Big Green Tractor - Jason
Aldean ...
0.0512820482254 9
Do I - Luke Bryan Good Directions - Billy
Currington ...
0.0512820482254 10
[10 rows x 4 columns]

量化比较模型


In [25]:
%matplotlib inline

model_performance = graphlab.recommender.util.compare_models(test_data, [popularity_model, personalized_model], user_sample=0.05)


compare_models: using 2931 users to estimate model performance
PROGRESS: Evaluate model M0
recommendations finished on 1000/2931 queries. users per second: 1246.26
recommendations finished on 2000/2931 queries. users per second: 1250.45
Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    | 0.0324121460252 | 0.00818102440416 |
|   2    | 0.0291709314227 | 0.0140074849901  |
|   3    | 0.0252473558512 | 0.0184681613545  |
|   4    |  0.02362674855  | 0.0236977123108  |
|   5    | 0.0224496758785 | 0.0279691270479  |
|   6    | 0.0206414193108 | 0.0312149240501  |
|   7    | 0.0193498074767 | 0.0341957977577  |
|   8    | 0.0185090412828 | 0.0372377111118  |
|   9    | 0.0176276583646 | 0.0397930446293  |
|   10   | 0.0168201978847 | 0.0424160726873  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1
recommendations finished on 1000/2931 queries. users per second: 1231.71
recommendations finished on 2000/2931 queries. users per second: 1190.71
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.20504947117  | 0.0633935523137 |
|   2    |  0.169054930058 | 0.0986681771354 |
|   3    |  0.147844876606 |  0.12538318045  |
|   4    |  0.131780962129 |  0.147150557373 |
|   5    |  0.118799044695 |  0.164068513749 |
|   6    |  0.109405208689 |  0.179810395829 |
|   7    |  0.101720524443 |  0.193726797104 |
|   8    | 0.0943790515183 |  0.204617400114 |
|   9    | 0.0887448349066 |  0.216253849263 |
|   10   |  0.083691572842 |  0.226237717039 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]


In [ ]: